Lesson 4 - Web API

Requesting information from the web

Python 'requests' module.

Using 'requests' module

Use the requests module to make a HTTP request to http://www.github.com/ibm

  • Check the status of the request
  • Display the response header information

Get status code for the request


In [1]:
url = 'http://www.github.com/ibm'


200

Get header information


In [ ]:

Get the body Information


In [ ]:

Using a Web API to Collect Data

  • An application programming interface is a set of functions that you call to get access to some service.
  • An API is basically a list of functions and datatsructures for interfacting with websites's data.

The way these work is similar to viewing a web page. When you point your browser to a website, you do it with a URL (http://www.github.com/ibm for instance). Github sends you back data containing HTML, CSS, and Javascript. Your browser uses this data to construct the page that you see. The API works similarly, you request data with a URL (http://api.github.com/org/ibm), but instead of getting HTML and such, you get data formatted as JSON.

Access data using web APIs

Write a program to access all the public OSS projects hosted by IBM on github.com using the web apis

Step 1: Access the Web API service and check rate limits


In [9]:



Response status - OK 
53

Step 2: Authentication (if required)

Authenticate requests to increase the API request limit. Access data that requires authentication.

Basic Authentication
  • Pass the userid and password as parameters in the response.get function
  • Little risky and prone to hacking. Create dummy user ID and password
OAUTH
  • OAuth 2 is an authorization framework that enables a user to connect to their account using a third party application
  • While this is more secure thant the basic authentication (i.e. passing the userid and password while you make a http request), it is a little more difficult to code.
  • It needs a personal token and a consumer key to be generated and passed to the webserver

Unfortunately different websites have different ways of generating and using the token and consumer keys. Hence we will need to write the authorization code for each website seperately. HOwever, every website provides detailed information on how you can generate and send the token and keys.


In [ ]:

Step 3: Parse the response

The json module gives us functions to convert the JSON response to a python readable data structure.

Write a program to get the number of OSS projects started by IBM


In [13]:



Response status - OK 
The number of public repos are :  851

Step 3: Follow the url information from the Web API to find what you need

Let us collect the information regarding the different projects started by IBM


In [ ]:

Step 4: Paginate to get data from other pages

Traverse the pages if the data is spread across multiple pages


In [ ]:

3. Write a CSV

Lets try to write the repos into a CSV file.

Write a code to append data row wise to a csv file


In [ ]:
import csv
WRITE_CSV = "C:/Users/kmpoo/Dropbox/HEC/Teaching/Python for PhD Mar 2018/python4phd/Session 2/ipython/Repo_csv.csv"
with open(WRITE_CSV, 'at',encoding = 'utf-8', newline='') as csv_obj:
    write = csv.writer(csv_obj) # Note it is csv.writer not reader
    
    write.writerow(['REPO ID','REPO NAME'])

In [ ]:
from google.colab import drive
from google.colab import files
drive.mount('/content/drive/')
uploaded = files.upload()

What do you think will happen if we use 'wt' as mode instead of 'at' ?

Write a program so that you save the IBM repositories into the CSV file. So that each row is a new repository and column 1 is the ID and column 2 is the name


In [ ]:
#Enter code here